Frequency, Collocation, and Statistical Modeling of Lexical Items: A Case Study of Temporal Expressions in Two Conversational Corpora
نویسندگان
چکیده
This study examines how different dimensions of corpus frequency data may affect the outcome of statistical modeling of lexical items. Our analysis mainly focuses on a recently constructed elderly speaker corpus that is used to reveal patterns of aging people’s language use. A conversational corpus contributed by speakers in their 20s serves as complementary material. The target words examined are temporal expressions, which might reveal how the speech produced by the elderly is organized. We conduct divisive hierarchical clustering analyses based on two different dimensions of corporal data, namely raw frequency distribution and collocation-based vectors. When different dimensions of data were used as the input, results showed that the target terms were clustered in different ways. Analyses based on frequency distributions and collocational patterns are distinct from each other. Specifically, statistically-based collocational analysis generally produces more distinct clustering results that differentiate temporal terms more delicately than do the ones based on raw frequency. 1 Acknowledgement: Thanks Wang Chun-Chieh, Liu Chun-Jui, Anna Lofstrand, and Hsu Chan-Chia for their involvement in the construction of the elderly speakers’ corpus and the early development of this paper. ∗ Graduate Institute of Linguistics, National Taiwan University, 3F, Le-Xue Building, No. 1, Sec. 4, Roosevelt Rd., Taipei Taiwan, 106 E-mail: {sftwang0416; flower75828; june06029}@gmail.com; [email protected] + Department of English, National Taiwan Normal University, No. 162, He-ping East Road, Section 1, Taipei, Taiwan, 106 E-mail: [email protected] 38 Sheng-Fu Wang et al.
منابع مشابه
Frequency, Collocation, and Statistical Modeling of Lexical Items: A Case Study of Temporal Expressions in an Elderly Speaker Corpus
This study examines how different dimensions of corpus frequency data may affect the outcome of statistical modeling of lexical items. The corpus used in our analysis is an elderly speaker corpus in its early development, and the target words are temporal expressions, which might reveal how the speech produced by the elderly is organized. We conduct divisive hierarchical clustering based on two...
متن کاملLexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملPublished vs. Postgraduate Writing in Applied Linguistics: The Case of Lexical Bundles
Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of much research in the last two decades. While many of such studies have been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few have addressed the use of some particular groups of lexical bundles within some gen...
متن کاملApplying Statistical Methods to Small Corpora: Benefiting from a Limited Domain
The application of statistical approaches to problems in natural language processing generally requires large (1,000,000÷ words) corpora to produce useful results. In this paper we show that a well-known statistical technique, the t test, can be applied to smaller corpora than was previously thought possible, by relying on semantic features rather than lexical items in a corpus of limited domai...
متن کاملThe Comparison of Native English and Persian Elementary School Students’ Performance on Lexical and Grammatical Collocations
The importance and howness of language learning/ acquisition has been a great concern for decades. There are many factors that play important roles in this regard. This research compared the performance of native Persian and English elementary students to see if there is any significant difference between the two groups and which type of collocation they performed better within the groups. For ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 17 شماره
صفحات -
تاریخ انتشار 2012